Alternative Approaches for Cross-Language Text Retrieval

نویسنده

  • Douglas W Oard
چکیده

The explosive growth of the Internet and other sources of networked information have made automatic me diation of access to networked information sources an increasingly important problem Much of this informa tion is expressed as electronic text and it is becoming practical to automatically convert some printed docu ments and recorded speech to electronic text as well Thus automated systems capable of detecting useful documents are nding widespread application With even a small number of languages it can be in convenient to issue the same query repeatedly in every language so users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems And since reading ability in a language does not always im ply uent writing ability in that language such users will likely nd cross language text retrieval particularly useful for languages in which they are less con dent of their ability to express their information needs e ec tively The use of such systems can be also be bene cial if the user is able to read only a single language For example when only a small portion of the doc ument collection will ever be examined by the user performing retrieval before translation can be signif icantly more economical than performing translation before retrieval So when the application is su ciently important to justify the time and e ort required for translation those costs can be minimized if an e ec tive cross language text retrieval system is available Even when translation is not available there are cir cumstances in which cross language text retrieval could be useful to a monolingual user For example a re searcher might nd a paper published in an unfamil iar language useful if that paper contains references to works by the same author that are in the researcher s native language Multilingual text retrieval can be de ned as selec tion of useful documents from collections that may con tain several languages English French Chinese etc This formulation allows for the possibility that individ ual documents might contain more than one language a common occurrence in some applications Both cross language and within language retrieval are in cluded in this formulation but it is the cross language aspect of the problem which distinguishes multilin gual text retrieval from its well studied monolingual counterpart At the SIGIR workshop on Cross Linguistic Information Retrieval the participants dis cussed the proliferation of terminology being used to describe the eld and settled on Cross Language as the best single description of the salient aspect of the problem Multilingual was felt to be too broad since that term has also been used to describe systems able to perform within language retrieval in more than one language but that lack any cross language capabil ity Cross lingual and cross linguistic were felt to be equally good descriptions of the eld but cross language was selected as the preferred term in the interest of standardization Unfortunately at about the same time the U S Defense Advanced Research Projects Agency DARPA introduced translingual as their preferred term so we are still some distance from reaching consensus on this matter

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Language Text Retrieval Research in the USA

The increasing availability of networked access to multilingual text collections has generated increased interest in the development of e ective and e cient cross-language text retrieval technology. Examples of cross-language text retrieval applications are discussed and a classi cation of known approaches is introduced. This is used to structure a comprehensive discussion of published research...

متن کامل

Document Translation for Cross - Language Text Retrieval at theUniversity of Maryland Douglas

The University of Maryland participated in three TREC-6 tasks: ad hoc retrieval, cross-language retrieval, and spoken document retrieval. The principal focus of the work was evaluation of a cross-language text retrieval technique based on fully automatic machine translation. The results show that approaches based on document translation can be approximately as eeective as approaches based on qu...

متن کامل

Document Translation for Cross-Language Text Retrieval at the University of Maryland

The University of Maryland participated in three TREC tasks ad hoc retrieval cross language retrieval and spoken document retrieval The principal focus of the work was evaluation of a cross language text retrieval technique based on fully automatic machine translation The results show that approaches based on document translation can be approximately as e ective as approaches based on query tra...

متن کامل

Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval

We investigates the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in ni...

متن کامل

Building parallel corpora by automatic title alignment using length-based and text-based approaches

Cross-lingual semantic interoperability has drawn significant attention in recent digital library and World Wide Web research as the information in languages other than English has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish, and French, has been widely explored; however, CLIR across European languages and Orienta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997